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ABSTRACT 

This report is the beginning of a series that will 
examine the performance of the school children of New Jersey relative 
to the performance of other children in the United States and 
worldwide. The measure of performance used was the 1992 National 
Assessment of Educational Progress (NAEP) mathematics examinations 
and their linked versions of the 1991 International Assessment. The 
NAEP is used because it is satisfactory in content and form and 
because the psychometric model underlying its scoring yields a single 
scale that allows easy comparison. Students sampled by the NAEP are 
drawn in a principled way from the populations of interest, 
supporting the accuracy and hones'.y of the assessment. Based on 
un tandardized results of the 1992 mathematics assessment. New Jersey 
was among the highest performing states. When results were 
standardized to reflect a single national demographic composition, 
New Jersey's rank among participating states rose to fourth. The 
United States, however, finished next-to-last when compared v;ith the 
other 14 nations. The top 57. of New Jersey students ranked third when 
compared with the Cop 57. of members of the Organization for Economic 



Cooperation and Development. Seven 
(Contains 10 references.) (SLD) 



figures illustrate the discussion. 
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On the Academic Performance of New Jersey's Public School Children: 
I. Fourth and Eighth Grade Mathematics in 1992 

Howard Wainer^ 
Educational Testing Service 

Introduction 

"...teaching is validated by the transformation of the minds 
and persons of the intended audience. " 

Bressler, 1991 

The most critical measure of any educational system is the performance of its stu- 
dents. But what yardstick should be used to accomplish this measure? The fact that 
modem education Las many goals suggests that we must measure the extent of its success 
in a variety of ways. This report describes the first of a series of researches that will at- 
tempt to characterize the performance of New Jersey's public school system. We will do 
this through comparative and absolute measures, the primary instrument of which will be 
the data gathered during the course of the National As??essment of Educational Progress 
(NAEP). 

NAEP is a congressionaily mandated survey of the educational achievement of 
American students and of changes in that achievement across time. Although this survey 
has been operational for nearly 25 years, it was only in 1988 that Congress authorized 
adding state level surveys to the national assessment. This was begun on a trial basis with 
states participating on a voluntary basis. In 1990 37 states, two territories and the District 
of Columbia participated in the fu:st Eighth Grade Math Assessment. In 1992 seven more 
states joined the state assessment yielding 44 jurisdictions. The 1992 assessment was ex- 
panded to also include the Fourth grade. In this report we shall focus attention only upon 
the 41 states in the assessment. Guam, the Virgin Islands and the District of Columbia 
will be explicitly excluded because they are sufficiently dififerent from the states in their 
size, character and composition so as. to distort most comparisons. 

The assessment methodology is technically sophisticated. Through the use of 
linking items and item response theory, the performance of all students participating in 
the assessment can be placed on the same numerical scale. Measuring students' growth is 
thus straightforward. Subtracting 4th grade scores from 8th grade scores is the growth 
obtained. Consequently the expansion of the assessment to the fourth grade provides us 
with two important bits of information. First, is a measure of how much mathematics 
Fourth graders know. Second, a measure of how much mathematics is learned between 
4th and 8th grade. Note that having a measure of the gain obtained (about 49 NAEP 
points on average) helps us to interpret the NAEP scale. It tells us that if one state trails 
another by about 12 points this is about the same as the average gain in one year of 
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school. Thus, when we compare California's mean 8th grade NAEP score of 261 to New 
Jersey's score of 273, we can interpret the 12 point difference as indicating that the 
average California 8th grader performs about the same in mathematics as the average 
New Jersey 7th grader would have. This helps give additional meaning to the numerical 
scale. 

More meaning still for the eighth grade math assessment is yielded by comparing 
performance on it with performance of 13 year old students in the 1991 International 
Assessment. Because the NAEP Math assessment was coordinated with the Intemationsl 
Assessment both sets of scores can, with reasonable confidence, be placed on the same 
scale. This was accomplished by having a common sample of examinees for both 
assessments. As we shall see, the performance of New Jersey's students compares 
favorably with those from the developed nations. 

The Mathematics assessment contained tasks for the students drawn from the 
framework provided by the Curriculum and Evaluation Standards for School 
Mathematics, developed by the National Council of Teachers of Mathematics. The con- 
tent and the structure of the assessment has been widely praised as being representative of 
the best that current knowledge and technology allows. A full description of the 1992 
Mathematics Assessment is found in NAEP 1992: Mathematics Report Card for the 
Nation and the States (MuUis, Dossey, Owen, & PhilUps; 1993). 

The NAEP State Assessment Sample 

Within each state 100 public schools are carefully selected to be representative of 
all public schools in that state. Within each school at least 30 students are chosen at ran- 
dom to be tested ( in larger schools this number can be as large as 90). Students (usually 
of foreign birth) whose English language proficiency is deemed to be insufficient to deal 
with the test, are excluded from the sampling frame. 

The Results 

All results are reported on a uniform scale that can meaningfully characterize the 
perfomance of students over a very wide range of proficiency. This scale can be used in 
a normative maimer, for example comparing one state with another, or one state with the 
nation as a whole. Or it can be used as an absolute measure, since expert judges have 
provided a correspondence between score levels and specific proficiencies. These 
proficiencies are denoted Basic, Proficient and Advanced and what is requked to perform 
at each of these levels obviously increases as the student progresses through school, but 
are always referred back to the five NAEP content are?is. These are: (1) numbers and 
operations, (2) measurement, (3) geometry, (4) data analysis, statistics, and probabiUty, 
and (5) algebra and functions. 

For example, a score of 211 is characterized as "Basic Level" fourth grade per- 
formance. "Basic Level" is defined as "showing some evidence of understanding the 
mathematical concepts and procedures of the five NAEP content areas. " The second 
level is called "Proficient" and is located at score 248 and reflects being able to 
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"consistently apply integrated procedural knowledge and conceptual understanding to 
problem solving in the five NAEP areas. " The highest performance level is termed 
"Advanced", is located at score 280 and reflects the ability to "apply integrated 
procedural knowledge and conceptual understanding to complex and nonroutine real- 
world problem solving in the five NAEP areas. " 

The mean performance of all participating states for the 8th grade assessment is 
shown in Figure 1. 



Insert Figure i Here, 



The results shown in Figure 1, while accurately reflecting the actual mean per- 
formance within each state, may not be appropriate for certain kinds of state-by-state 
comparisons. The student populations of each state differ in their demographic make-up. 
As such, some states face more difficult challenges in educating their populations than 
others. One obvious example of such a situation occurs in states like California, New 
Jersey and Florida that have large immigrant populations whose children, even if they do 
not participate directly in the assessment, require a larger share of instructional resources 
than native English speakers. In addition, the various subpopulations of students in each 
state often perform very differently from one anotlier. For example, in Figure 2 are 
displayed ttie mean performance of students in different parts of the country broken down 
by race/ethnicity. 

Figure 2 has two important messages: 

1. There are very large differences in performance by ethnic group. These differences are 
much larger than the geographic variation observed. 

2. New Jersey's students perform better than the national average and all regional aver- 
ages for all groups. Thus although it is true that New Jersey's African- American and 
Hispanic students do worse than White students, they do better than African- American 
and Hispanic students in any region. 



Insert Figure 2 Here 



In addition to the widely different performances of the various demographic sub- 
groups, the distribution of these groups is not uniform across all states. A brief summary 
of these distributions is shown in Table 1 . As is evident. New Jersey's racial/ethnic dis- 
tribution is rather close to that of the nation as a whole. The central states are the most 
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deviant in the sense that they have a substantially larger proportion of their student popu- 
lation that is White. 

NAEP 1992 Trial State Assessment 

Percentage Race/Ethnic Representation in NJ 
Compared to that in otiier parts of tiie Nation 



Race/Ethnicity 



Mathematics 


White 


Blacl< 


Hispanic 


Asian 


Other 


NATION 




69 


17 


10 


2 


2 


Nortlieast 




68 


20 


9 


2 


1 


Southeast 




63 


29 


5 


1 


2 


Central 




79 


11 


7 


1 


2 


West 




65 


11 


16 


5 


3 


NJ 




67 


14 


13 


5 


1 



Table I. The national and regional racial/ethnic distribution. 



If we wish to use such data to draw inferences about the relative efficacy of a 
state's schools, it is considered good practice to statistically adjust for the demographic 
differences. Why is it helpful to make such adjustments? It is beyond the goals of this 
report to investigate fully why there are differences in performance by demographic 
group, although there is a rich literature of fact and conjecture that attempts to do so^. 
However, to understand why we need to make a statistical adjustment it is important to 
provide some explanation. To do so requires that we draw the important distinction 
between education and schooling. Tne school is only one agency among many - family, 
church, neighborhood, mass media ~ that provides children with windows on the world. 
Mass schooling was invented because families could no longer perform essential 
educational functions. "Once upon a time schools could proceed on the only partly 
fictitious assumption that, in their efforts to teach children, they were supported by 
relatively stable families, and by neighborhoods that enforced elementary standards of 
civility.'' ^ Not only is this is much less true now than in the past, but also it is less true 
within some demographic groups than others. NAEP measures education and not just 
schooling, but inferences about the differences among states are explicitly about schools. 
To add some validity :o those inferences we must try, statistically, to place all states on a 



^The Coleman report (Coleman et al, 1966) remains the most encyclopedic of such investigations, 
summarizing, as it does, the perfonnance of more than 645,000 children in 4,000 public schools. It arrives 
at the not surprising conclusion that family and economic variables drive educational achievement. 
^This quote and much of tl ■ surrounding logic comes from Marvin Bressler's delightful and wise 1992 
essay, "A teacher reflects." 
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level playing field with respect to their children's nonschool educational opportunities. 
Adjusting for differences in demographic groupings is a crude beginning. 

What follows is a methodology that recognizes that such differences exist, and a 
statistical technology that attempts to partition state differences due to demographics 
from those due to differences in school performance. 

Interpretable comparisons through statistical standardization 

The between-state comparisons that are implicit in Figure 1 can yield misleading 
inferences if one is not acutely aware of the differences in the demographic make-up of 
all of the constituent units. This is of such a complex nature that it is impossible to keep 
things straight without some formal adjustment. One accepted way to accomplish this 
adjustment is termed 'standardization' (Mosteller & Tukey, 1977, p. 223). The basic idea 
of standardization is to choose some specific demographic proportions as the standard 
and then estimate each state's mean proficiency on that specific mixture. In this instance 
it is sensible to choose the configuration of the entire United States as the standard mix- 
ture. Thus the estimated score for each state will be the answer to the question, "What, 
would the national average be if all children went to school in this state?" 

How is this adjustment accomplished? It is very simple in theory, although some- 
times, because of peculiarities in sampling weights, a bit tricky to execute. We take the 
mean score obtained in a state for a particular subgroup and multiply it by that subgroup's 
proportional representation in the standard (national) mixture. Do this for all subgroups 
and the resulting score is the adjusted one. So far we have reported New Jersey's scores 
for four racial/ethnic groups. As we have seen, because New Jersey's demographics are 
so much like the national standard this sort of adjustment would have little effect. A 
greater effect would be on more disparate areas (i.e. the central US,). Is it sufficient to 
adjust simply on the basis of this one demographic variable? No, although if we adjust on 
too many variables, and so include some irrelevant ones, no damage will be done, since 
irrelevant variables will typically not show differences in performance. In this paper we 
adjust on three variables. These are: 

1. Race/ethnicity - Five categories: White, African- American, Hispanic, 

Asian/Pacific Islander, Other 

2. Type of community - Four categories: Extreme Rural, Advantaged Urban, 

Disadvantaged Urban, Other. 

3. Limited English Proficiency - Two categories: Yes, No 

This resulted in dividing each state up into forty pieces corresponding to the forty possi- 
ble combinations (5x4x2) and calculating the mean proficiency within each of those 40 
groups. These 40 means were then weighted by their representation in tlie entire nation 
yielding a standardized score for each state. The results of this standardization are shown 
in Figure 3. 



Insert Figure 3 Here 



After standardization to national demographic norms we find that although New 
Jersey's mean score has not changed much, ten other states, with more homogeneous stu- 
dent populations (e.g., Massachusetts, Wisconsin, Iowa, Idaho, North Dakota) that had 
previously been slightly higher are now ranked equal to or below New Jersey. 

What is the point of standardization? There are many reasons. So far we have 
mentioned just one — making comparisons between states on the basis of tlieir children's 
performance on the same tasks and not on the differences in the demographic structure of 
their population. A second, and oftentimes more important use of standardized scores is 
in easing the difficulties in making inferences about changes that occur within a state 
across time. When changes do occur the standardized scores assure that the change re- 
flects changes in the students' performance and not changes in the demographic structure 
of the state. We expect that as time goes on this will be the aspect of greatest value of the 
standardization.^ 

A natural question to ask is, "At what age do the differences observed among the 
states manifest tliemselves?" If we see the same difference between two states in 4th 
grade as we do in 8th, it implies that the lower scoring state needs to place more emphasis 
on learning in lower grades. If the difference observed in 4th grade grows proportionally 
larger in 8th it means that the deficit is spread throughout the years of school and a more 
systemic change is needed for improvement. Trying to make inferences of this sort based 
on just two time points is risky, but is certainly instructive. Shown in Figure 4 are the 
standardized scores for the 1992 4th grade math assessment. A comparison with the 8th 
grade rankings shown in Figure 3 indicates that the positions established in 4th grade are 
maintained and the differences observed between states increase. The range of 24 NAEP 
points observed between the relatively extreme states of North Dakota and Mississippi in 
8th grade was 13 NAEP points in 4th grade. One way to interpret this is that the average 
Mississippi 4th grader was a year behind the average North Dakota 4th grader in math, 
and by tlae time they both reached 8th grade this deficit had increased to two years. 



Insert Figure 4 Here 



3a caveat: Big changes as a result of a statistical adjustment tell us that great care must be exercised in 
making inferences. A careful comparison of Figure 1 and Figure 3 reveals that most states do not change 
very much. This is evidence that the standardization is generally behaving as it ought, for if one disagrees 
with the structure of the adjustment one can still be content that it isn't changing anything drastically. A 
notable exception to this would be the District of Columbia, whose small size and atypical demographic 
structure would combine to vield an enonnous shift. Inferences about the meaning of its standai'dized 
location ought not be the saine as those di awn about the states. For the more important purpose of tracking 
changes in a jurisdiction's performance over time, it is probably prudent to develop a special 
standardization for each of the four most unusual jurisdictions (DC, Hawaii. Guam, Virgin Islands). 
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By subtracting the scores shown in Figure 4 from those in Figure 3 we obtain estimates of 
the average growth exhibited in each state. This result is s.iown in Figure 5 below. All 
states' scores are standardized to the demographic structure of the nation as a whole. Thus 
were these results longitudinal rather than cross-sectional, we would be able to interpret 
the changes as due entirely to growth and not demographic changes. As they are now 
constituted these changes in scores are due to differences in performance and not to de- 
mographic differences in the two grades. 



Insert Figure 5 Here 



International Comparisons 

As mentioned previously, the 1991 International Assessment contained enough 
NAEP items to allow accurate comparisons. The most newsworthy result was that the 
United States finished near the bottom in this assessment, finishing ahead of Jordan but 
behind all of the participating developed nations. This was (properly) viewed with alarm. 
But, as we have seen in the preceding figures, there is tremendous variation within the 
United States. Shown in Figure 6, are the results of this assessment augmented by the 
inclusion of New Jersey (standardized to national demographics). As is evident. New 
Jersey's students' performance was sixth among all nations participating in the as- 
sessment. Further details of the International Assessment can be found in Salganik, 
Phelps, Bianchi, Nohara, & Smith (1993). 



Insert Figure 6 Here 



Interpretation of this figure is helped by remembering that, on average, students advance 
roughly 12 NAEP points a year. Thus the average student in Taiwan and Korea is about 
a year ahead of the average New Jersey student, who is within a month or two of the 
other developed nations. 

Thus we see rather dramatically that because of the great diversity within the United 
States looking at just an overall figure for the entire country provides an incomplete and, 
for some purposes, misleading picture. Because New Jersey's score is standardized to the 
demographic structure of the entire nation one can interpret this result as what the nation's 
location would have been if all of the states' educational systems performed as well as 
New Jersey's. 




Within state variation 

We have seen that the variation among states (rougtily 30 NAEP points from 
highest to lowest) makes interpretation of a national mean of limited value. In the same 
way, the variation within states dwarfs the variation between them. In most states the av- 
erage score obtained by the lowest 10% of the students is more than 90 NAEP points 
lower than the score obtained by the top 10%"*. 90 points is an enormous gulf. Before try- 
ing to understand the reasons for this great disparity (with an eye toward developing 
strategies for ameliorating it) it will be useful to continue this series of comparisons for 
one important segment of the population ~ the very top. 

In Figure 7 is a comparison of the performance of the top 5% in the 1992 8th 
grade math assessment with the top 5% of the various OECD countries. We see inmiedi- 
ately that New Jersey's top 8th grade students compare favorably with their counterparts 
throughout the world. The United States as a whole has also improved relative to the 
other countries, but still k^s the other developed nations by from 3 to 12 months. 



Insert Figure 7 Here 



Summary & Conclusions 

His report is the beginning of a series that examines of the performance of New 
Jersey's school children relative to other children within the United Statf;s and world- 
wide. The measure of performance used was the 1992 National Assessment of 
Educational Progress mathematics exams and their linked versions used in the 1991 
International Assessment. This is done in the ardent belief that the efficacy of schools 
must be measured by the performance of their students. We chose NAEP for several 
reasons, three of wnich were: 

1. It is composed of test items that satisfy the best of current wisdom with respect 

to both their content and their form. 

2. The psychometric model underlying the scoring of NAEP yields a single scale 

on which not only can the fourth grade and eighth graders be characterized, 
but also the 13 year olds from the OECD countries from around the world. 

3. The students sampled by the NAEP are drawn in a principled way from the 

populations of interest. Tliis in sharp contrast to the sorts of self-selected 
samples that are represented by state means of such college admission tests as 
the SAT and the ACT. It is well known that trying to draw inferences of 



■^In all but one of the OECD countries this gulf between the 10th percentile and tht 90th is somewhat 
smaller, about 70 points. Taiwan is the lone exception a difference of 96 points. 



useful accuracy from such self-selected samples is impossible (Wainer, 1986a, 
b; 1989a, b). 

We concur with prevailing expert opinion that of all broad-based tests NAEP provides the. 
most honest and accurate estimates of the performance of the students over the broad 
range of jurisdictions sampled. 

We found that, based on the unstandardized results of the 1992 Mathematics 
Assessment, New Jersey was among the highest perfonning states. Once these results 
were standai'dized to reflect a single (national) demographic composition New Jersey's 
rank among the participating states increased to fourth. 

The United States finished next to last when the performance of it's students was 
compared with that of the students in the other 14 participating OECD nations in the 1991 
International Assessment. New Jersey's students were ranked sixth on the same 
assessment when their performance was placed on the same scale. However New Jersey's 
best students, it's top 5%, when compared with the performance of the top 5% of all other 
OECD nations, ranked third; trailing only Taiwan and Korea. 
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Figure 1. A stem & leaf display of the 1992 NAEP State Assessment in 8th Grade 

Mathematics. These results are the raw (unstandardized) means from each state. 
New Jersey ranks 14th among all participants. 
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NAEP 1992 Trial state Assessment 

Subgroup comparisons of NJ with other parts of the Nation 
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Figure 2. A stem <k leaf depiction comparing the performance New Jersey's students, 

broken down by race/ethnicity, with similar groups from all other parts of the 
country. Samples of "Asian/Pacific Islanders" were insufficient to obtain accurate 
estimates for any otlier regions than the West. 
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NAEP 1992 Trial state Assessment 

(Standardized for demographic differences) 
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Figure 3. After standardization New Jersey ranks fourth among all participating states in 
the 1992 8th grade mathematics assessment. 
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NAEP 1992 Trial state Assessment 

(Standardized for demographic differences) 
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Figure 4. The standardized scores for the 41 states in the 1992 4th grade math asses s- 
ment. 
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International 1991 Mathematics Assessment 

(Predicted Proficiency for 13 year olds) 
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Figure 6. Placing New Jersey explicitly into the 1991 International Assessment shows 
that its students performed above the average level of most developed nations. 
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International 1991 Mathematics Assessment 

(Predicted Proficiency for 95th %ile of 13 year olds) 
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Figure 7. New Jersey's top students rank third in the world in the 8th grade math assess- 
ment. 



ERIC 



17 



